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A Structural cognitive Approach to the Assessment 
of Classrocan Learning 

The present paper describes a method of assessing classrooana knowledge 
that involves an integration of psyahcxtvetric and cxDgnitive perspectives • 
Perhaps because of their different interests these two approaches 
historically have had relatively little influence on one another* Whereas 
psychcrtietricians are primarily ooncemad with the predictiveness of a 
measure, cognitivists have been more concerned with representational models 
of knowledge. In this paper we hcpe to show that there exists a natural 
synergism between the cognitive and psychoDnetric ^proaciies that vAien 
apprcpriately integrated can mutually facilitate progress towards their 
respective goals. More specifically, the cognitive perspective, with its 
structural assunptions regardij>g the r^resentatic«i of knowledge, can provide 
the basis for sa:tve new and useful methods to assess clc^ssrocm leaining. The 
psychcMvetric approach, on the other hand, with its eitphasis on test validity 
and reliability, can provide a much needed eirpirical basis for models of 
knowledge r^resentation. ^- 

We begin this paper by contrasting the cognitive approach and the 
psychcffnetric approach as they are inplemented in classroan assessment. We 
then turn to a more detailed discussion of a structural approach to knowledge 
assessment, vMch integrates the cognitive and psychometric p)ersp)ectives 
within the context of classroom learning. 

Two Contrasting Perspjectives on Knowledge Assessment 

The p>sycho^r>etric approach, as applied in the classroc^ 
setting, usually assesses knowledge with conventional essay, true- 
false, and multiple choice exams. A student's performance on this 
typ)e of exam is usually represented in terins of a p)ercentage 
correct. Many educators are perhaps so familiar with this generic 
form of examination in their classes that they no longer consider 
the assurrptions underlying this "hew much" approach to knowledge 
assessment. By accumulating points across questions, we are 
assuming a kind of independence that suggests we conc^tualize 
knowledge as a list of independent facts or elements. Althou^ this 
criticism maybe less true of essay exams, it remains the case that 
losing a single index, such as percentage correct tells us very 
little regarding \idiat a student knov/s or does not knew. 

An simple list of item may serve as an apprqpriate r^resentation for 
certain limited doraains (e.g., the capital cities for the 50 states of this 
country) , but there is a great deal of enpirical and theoretical work 
from the cognitive literature, suggesting that a list is not a valid means of 
representing nvore cortplex dcanains of knowledge (e.g., Ciii, Glaser, & Farr, 
1988; Center & Collins, 1983) . A cc«nmonly held and. long-standing assuitption 
in cognitive p)sychology is that knowledge is orgaj-iizedt and structured (Bower, 
1975; Tulving & Donaldson, 1972; Wertheiiner, 1945). F}rom the cognitive 
persp)ective, to be knowledgeable of a doaoiain, one mnost understand the 
interrelationship>s among the iiiportant concepts within the dcanain. Consistent 
with this assumption, cognitive models of knowledge representation are 
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primarily cjoncemed with tlie types of structures that organize bodies of 
kncwledge* In fact, the ir^eaning of any specific ocrK::qpt is assumed to be 
largely d^)endent on its interrelationships with other cono^ts. Althou^ 
there are a variety of structural models of kncwledge in the cxjgnitive 
literature (e.g., Anderson & Bower, 1973; Collins & Quillian, 1969), iriost^ 
share a central theme in assuming that the interrelations among oonc^Jts is 
an essentieLL prcperty of knowledge. 

As Shavelson and colleagues (Schavelson, 1972; Schavelson & Stanton, 
1975) realized sonve two decades ago, this assumption regarding the 
representation of knowledge has some jutportant iirplications for the 
assessment of classrocra learning* Basically, how we assess knowledge should 
be consistent with hew we assume knowledge is r^resented* If structural 
prqperties are an inportant corrponent of knowledge r^resentation, then our 
assessment tools must nveasure these structural prqperties. Over the past few 
decades, an impressive literature has accumulated indicating that the 
structural properties of dcatain knowledge are closely related to ccatpetence 
in the dcatain (e.g., Chase & Simon, 1973; Chi, Glaser & Rees, 1981)C Frcxii 
tills perspective, knowledge of a dcamain iirplies at scx^b level understar-ling 
hew the various dcffiain concqpts are interrelated. Ihis view strongly suggests 
that our methods of assessment must capture this struc±ural ccsrponent of 
knowledge in order to be valid. 

An obvious iirplication is that we should use some type of cognitive 
representational nKxiel to assess an individual's knowledge of a detrain. In 
the next section we describe in some detail hew a structurally oriented 
aE^jroach to knowledge assessment can be successfully itrplemented. Hcwever, 
before we conclude this section we need to discuss how the structural 
assessment ajproach is mutually beneficial to the cognitive approach and the 
psychoan^tric approach as it is applied in the classroom. Its potential 
benefits to the psychoametric approach are twofold. First, it would more 
solidly ground classroom evaluation in a context of knowledge representation 
theory. Secondly, if structural aspects of knowledge are related to doanain 
performance, the assessment of these structural prcp^erties should improve 
prediction. Finally, as will be discussed in some detail later, the 
r^resentation may be presented in the form of a visual graph that allows the 
instructor to more easily identify the locus of a student's misconc^tions 
regarding the domain. Ihis in turn could facilitate individualized training 
intervention. 

One benefit of a structural approach to assessment for cognitive 
theory is that it provides an ertpirical basis for evaluating different 
representational models of knowledge. Ihis type of representational 
validation has been largely lacking in the 

cognitive literature. As will become apparent when we describe the 
iirplementation of the structural approach, the structural representations 
are evaluated in terms of their ability to predict classroean exam 
performance. In other words, each student will have her unique, eirpirically 
derived representation of a knowledge dcsuain. Ihus, predictive validity plays 
a central role choosing a theoretical representation of domain knowledge. 
This stands in contrast to the methods by which most cognitive 
represesntational models are validated. Cognitivists have 
been far more concerned with issues relating to the architecture of their 
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models of semantic memory and knowledge representation. Among other things, 
these models attertpt to capture the way we r^idly access and retrieve 
various bits of information from rmtary. E^q^eriments designed to test these 
models often look at how stimulus parameters (e.g. , word length) influence 
respcaise latencies. The models are intended to ^Rply to large pc^Jations 
(e.g., native English speaKing adults), or specific groups (e.g., expert 
programmers) , with little or no interest in individual differences. 

In summary, our aim is to build seme bridges between applied educational 
testing and cognitive theories of knowledge r^resentation. We believe the 
schism between the two fields is unnecessary and counterproductive. It 
develcped, we believe, primarily out of their different interests. The 
cognitivists were concerned with the develc^inent of models of cognitive 
r^resentational systems, vdiereas the educational cissessment researchers were 
more concerned with the immediate issues of validity and reliability. 
Indeed, there exists a natural synergism between the two fields that could be 
mutually beneficial to the progress of both. Specifically, we hope to show 
tliat test theorists' concerns with predictiveness will benefit modeling of 
cognitive structure, and the cognitivists' structural perspective will 
positively influence the develqpsnent of the methods used to assess dcanain 
knowledge. 

Structural Assessment: Methods and Findings 
In this section we provide a general methodological overview of 
structural approaches to kncwledge assessment, with special eo^iiasis on 
methods we have developed over the past few years. Althou^ not a 
co«prehensive review of the literature, the discussion should give the reader 
a basic understanding of the structural approach, ha^ it differs from more 
conventional testing ajproaches, a smattering of relevant findings, and some 
of the more iirportant issues and iirplications viewed from the structural 
perspective. 

Research on structural knowledge assessment in classroosns began to 
appear, primarily in educational psychology journals, in the late 1960's and 
early 1970^3 (e.g., Jdinson, 1967; 1969; Kass, 1971; Shavelson, 1972; 
Shavelson & Stanton, 1975) . Several iiivestigators r^xDrted encouraging 
findings, indicating that classroom performance was related to students' 
structural organization of the central conc^ts in the course. For exaiiple, 
Fenker (1975) had students in a measurement class and a design class rate the 
relatedness of pairs of concepts and then transformed their ratings to an MDS 
spatial representation. The students' ms representations were then oanopared 
with a referent representation based on the average ratings of ei^t experts 
in each domain. He found that students' similarity to the referent structure 
was correlated (r=.54) with course grades in the design course, and (r=.61) 
with grades in the measurement course. Despite the generally positive outocsne 
of this early work, there were a number of specific methodological problems 
that hairpered further advances. Perhaps foremost was the lack of 
quantitative methods for evaluating structural repreisentations. We believe 
that our current research has made significant progtress in addressing these 
issues* 
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CXir discussion of structural assessment methods is organized iJi terms of 
the three major steps tiiat are involved in their ijtplenventation: (a) 
elicitation - evdking sane behavioral index of an individual's organization 
of domain concepts; (b) representation - aj^lying techniques that transform 
the elicited data into a representation that captures the ijtportant structural 
prc^)erties of domain knowledge; and (c) evaluation - quantifying the level of 
knowledge or sophistication that is reflected in the r^resentatic»i. 
Elicitation 

Elicitation, as the word suggests, is the process of evoking or 
extracting Nvhat a person knows about scxrve knowledge domain. There are a wide 
range of methods for elici.ting knowledge, ranging fron direct aj^roaches, such 
as Interviews and conventional essay exams, to more indirect afproaches where, 
for example, knowledge may be inferred on the basis of reaction times (e.g. , 
Collins & Quillian, 1969)- 

One iirportant point about elicitation is that the method of elicitation 
should be carpatible with the cognitive model of knowledge representation. 
Ihus, if it is assumed that knowledge is structural in its r^resentation, it 
follows that tlie elicited behavior should be sensitive to the 
interrelationships among the concepts. Tiie iitplications of this assertion 
will be better appreciated after we have discussed the elicitation, 
representation, and evaluation phases of the structural ajproacho 
For the present, it suffices to say that the elicitation procedure must 
provide sane indication of the relatedness between pairs of conc^ts. With 
an ajprcpriate r^resentational transformation of these relatedness ratings 
it should be pos^.ible to capture more global structural properties of dcnain 
knowledge. 

Although a variety of elicitation methods have been used to obtain 
conc^t relationships, including word associations (Johnscai, 1967) , ordered 
recall (Cooke, Durso, & Schvaneveldt, 1986) , and card sorting (Shavelson & 
Stanton, 1975) , siitply having subjects make subjective ratings of degree of 
relatedness between pairs of concepts works quite well in assessing an 
irdividual's knowledge of the interrelations among domain conc^Dts (Fenker, 
1975; Goldsmith, Jdhoison, & Acton, 1991). Furthermore, there may be certain 
advantages to using relatedness ratings to elicit domain knowledge. First, 
subjects have no difficulty using a nuiTverlcal scale to ej^press their sense of 
relatedness. As a result, it is relatively siitple to autcanate the 
administration and scoring of the ratings. This allows for the objective and 
efficient gathering of large amounts of relatedness data. Second, unlike essay 
exams and interviews, relatedness ratings do not assume that subjects have 
conscious access to all relevant knowledge. In fact, in our own work we have 
found that requiring subjects to make rapid relatedness judgments on the basis 
of their initial intuitions may result in more reliable and valid ratings than 
allowing unlimited time. 

Two questions about concept selection inevitably arise vAien using 
relatedness judgments to assess domain knowledge, namely, hew many and which 
concepts should be rated? Not surprisingly, these two qiiestions are cicely 
related, since the number of concepts required to obtain a valid assessment 
is likely to depend on how the concepts are selected. 

In deciding on the number of concepts to be rated we must consider how 
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the number of cx>ncepts influences the total number of pairs that are rated. 
At the extreines each concept could be paired wi.th one or all other oonoqpt£: 
in the list. Because soone structural nvethods of analyzing ratings require 
that data be collected on all pairwise conbinations of concepts, (e.g., 
Pathfinder^ Sciivaneveldt, 1990) , we will focus the discussion on this case. 
V3hen all pairwise cowibinations of concepts are rated for n concepts, there 
will be [n(n - 1) /2] pairwise ratings. For exairple, 24 concepts would result 
in 276 pairs, which requires approximately 45 minutes for most students to 
carplete. For practical considerations, including attention span and fatigue, 
this sets an upper liinit of approxiitately 30 concepts we can expect students 
to rate in a single session. 

In one study (Goldsmith, Jc*nson, & Acton, 1991) involving an 
undergraduate course in design of experiments, we found that vAien students 
rated all pairwise connbinations of concepts, predictiveness of course 
performance iirproved in a linear xnanner from .15 to ,74 as the number of 
conc^ts rated increased from 5 to 30. Although this suggests that more is 
better, we have found with 24 concepts predictions of college classjcoom course 
performance ranged fron approximately .50 to .85 across several different 
domains (cognitive psychology, corputer programing, and d^=*sign of 
experiments) . 

We turn next to the question of hew concepts are selected. We first 
attenpted to generate a fairly catprehensive list of the important conc^jts in 
a subject by analyzing the glossary and index of relevant textbooks. We then 
conferred with the course instructor, to add any inportant concepts that were 
missing. From this list we selected a sample of concepts (usually 24) that 
the instructor agreed were representative of the course material. 

Considerable work is left to be done on developing a set of criteria to 
serve as a systematic basis for selecting concepts. One obvious criterion 
proposed by Hirsch (1987) and Boneau (1990) is the conc^t's ittportance to the 
domain, as judged by experts. Being knowledgeable of the most important 
concepts within a domain may be sufficient if our only goal is to define sora 
basic level of cotpetence, but these conc^yts may not adequately discriminate 
among hi^er levels of expertise. Thus, another basis for selection would be 
to select those concepts vAiich best discriminate between levels of expertise. 

Selecting concepts on the basis of their correlation with exam scores is 
similar to the item selection procedure commonly used in test construction 
(Anastasi, 1988) . When this procedure is used in test develcpitvent i t ajplies 
to specific items, vdiereas in the rating task the selection of a oorioept would 
iirply that it would be paired with the other n-1 concepts. Ihus, item 
selection may be more efficiently applied to pairs of concepts than individual 
concepts. 

Recently, we have found (Goldsmith & Jdinson, 1990) that by selecting the 
more predictive pairs, it is possible to predicrt: classroom exam performance as 
well with ratirqs of 100 or fewer selected pairs, as with all 276 pairwise 
combinations of 24 concepts. Simply in terms of prediction there 
appear to be obvious benefits to enploying an item selection procedure. 
However, there is a cost when it comes to transforming the ratings into a 
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structural representation. This will became itore aj^parent in the next 
section, vAiere we discuss the representation of the elicited kncwledge. 

Representation 

Once we have elicited an individual's concept interrelationships in a 
dcxuain, we must decide how to transform these raw proximities into a 
representation that best moaals the individual's knowledge. We inention txiree 
iirportant criteria in choosing a representation. First, the r^resentation 
should have acceptable predictive validity. That is, we should be able to 
predict an individual's level of oarpetenoe in a danain at least as well with 
the representation as with the untransformed ratings. 

Second, the representation should be easily ocxTprehended. One advantage 
of many scaling algorithms is that they result in visual representations 
depicting the organization among conc^ts in a nanner that is relatively 
easily interpreted. For example, cluster analysis represents tlie conc^ts 
organized in terms of a hierarchical graph (Jdonson, 1967; Milliganr-& Cocyper, 
1987) . Ihus one can see by visual examination hew an individual organizes the 
corkc^ts within a dcarain. 

Finally, the representation should be consistent with our theoretical 
conceptions of kncwledge. In the case of conventional exams we often siirply 
lose the percentage correct to represent vtot an individual kncws about sane 
dcxnain. As argued above, this method suggests that kna^ledge can be 
conceptualized as an accumulation of independent facts. A percentage index 
estimates the proportion of information known. Although the information may 
actually involve understanding certain conceptual relationships, a percentage 
does not explicitly reflect the structural prc^)erties of the individual's 
knowledge. 

The next question is to determine which type of representation better 
models the specific structural pr<:^)erty that is assumed to be important. There 
are a variety of scaling procedures that researchers have historically used to 
infer the structural organization underlying similarity judgments. One of the 
more frequently v\sed methods is multidimensional scaling (MDS) (e.g., Kruskal, 
1964) , vAiidi represents a set of concepts in terms of an n-dimensional 
Euclidean space. Other scaling algorithms such eis cluster analysis (e.g. , 
Johnson 1967) and additive trees fSattath & TVersl^, 1977) result in 
hierarchical graph r presentations. A more recently developed scaling 
algorithm. Pathfinder (Schvaneveldt, 1990) also organizes the concepts into a 
connected graph representation, but Patlifinder does not impose a hierarchical 
solution and thereby allows greater freedom in developing an individual's 
structural grajii. 

To provide a concrete illustration of a Pathfinder network. Figures 1 
and 2 show Pathfinder solutions for an expert's and a novice's ratings of 
24 concepts from a cognition and memory course. Those readers having L^anoe 
background in cognitive psychology wi;*l see that, while some of the novice's 
structure is quite reasonable, it reveals a number of either missing or 
inappropriate relationships • 
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Figure 1 Pathfinder network solution to expert's ratings of 24 
concepts from course on cognition and memory* 
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In choosing a type of representation, all of the cibove criteria inust be 
considered. If the research is theoretically itvotivated the theory will suggest 
the structural prqperties that are of primary .interest, and this will 
likely favor one representational aj^roach over others. For exaitple, there is 
evidence (Holman, 1972; Pruzansky, Tversky, & Carroll, 1982) suggesting that 
spatial representations, such as ME)S, work better for pero^tual phenanenon 
(e.g. , color represented in terms of a three dimensiatial space involvir^g hue, 
saturation, and bri^tness) , vAiereas network r^resentations are better for 
cono^>tual phencxtvena (e.g., a biological taxonomy of aniital species). 

If, on the other hand, the research has a irore applied orientation then 
ease of r^resentation may play a more .iitportant role. For exaitple, assume 
the goal is to design an individualized curriculum that is aiitved at addressing 
specific kncwledge deficits within a dcxnain. This process could be 
facilitated with the use of network r^resentations, such as those presented 
in Figure 1. By visually examining student and expert networks, it could 
be determined vrtiich specific clusters or connections were missii>g ftrom an 
individual student's organization of a domain. 

Finally ^ the choice of representation can be based on predictiveness. 
Using this criterion, the type of representation that provides the best 
prediction of domain coorpetence is preferred. We believe that the 
predictiveness criterion, if used in moderation, could have a healthy 
influence on the theoretical develcpnent of cognitive r^resentations by 
forcing the rqpresentations to make more fine-grained distinctions. Many 
models of knowledge representation (e.g. , Collins & Quillian, 1969) are able 
to make very general predictions regarding the organization of knowledge 
(e.g. , the attribute of singing is more closely related to canaries than is 
the attribute of eating) , but they fail tc address individual differences in 
domain corpetence. 

Ihere is a danger of overerrfiiasizing predictability as a basis for 
favorixig a particular representational transformation. On first consideration 
it may appear that predictability is a cosnrpletely objective basis of 
evaluating the validity of alternative r^resentations. This assunption, 
however, is only true to the extent that the external criterion that is being 
predicted is an objective definition of conpetence. In the case of our own 
work we have been using course points frcan classrocaort exams as the external 
criterion. At sane point we mast ask ourselves if we would be happy if our 
structural measure correlated perfectly with exam scores. Obviously not. The 
point is, we doubt the ultimate validity of conventional exams, but we must 
use them as a means of bootstrapping a new alternative. The eventual 
acceptance of a structural approach to assessment will rest upon a multitude 
of criteria. Thus, the overenphasis on a single criterion at this early 
juncture is likely to be misguided. 

In concluding our discussion of knowledge representations, it should be 
apparent that rasearch and theory iii this field is still in its infancy. It 
is far too early to excluoe alternative representational systems fron further 
consideration on the basis of the prelmiriary data that is currently 
available. We are proposing a broad scale program of research in which 
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different investigators will ej^lore a variety of methods and ajplications. 
The problems are sufficiently ccxtplex to acoonmodate irore than a single inodel. 

Evaluation 

The third step in knowledge assessment is to evaluate an individual's 
loraw'e^dge r^resentation. VJhat level of sophistication or ccmpetence is 
indicated by a particular representation? Clearly, we must have scxne ineans of 
transforming a representation into a siirple index of ccmpetenoe. Vfe will 
discuss two furdamentally different methods of evaluation. One approach we 
call referent-based, in v*iich the student's r^resentaticai is ccsrpared against 
sane external standard. In referent-based evaluation some index of simlarity 
between the student and expert referent representation is used to predict 
domain ccsrpetence (e.g., classrocm exam p^orroance) . The other afproach to 
evaluation is referent free in that the assessment refers to intrinsic 
prcperties of the student representation. 

Referent Based Evaluations . VBien attenpting to assess domain cctrpetence, 
the most obvious external standard is an expert or gra^ of experts in the 
field (Chi, Feltovich, & Glaser, 1981) . In our work, when assessing college 
classrocxn knowledge, cxDurse instructors naturally serve as experts. Often we 
have averaged the instructor's ratings with a number of other faculty and 
graduate students who have taught similar courses. We find that a referent 
structure based on the averaged ratings of a niamber of experts is vasually a 
better predictor of exam scores than one based only on the ratings of the 
individual instructor for the course (Acton, 1990) . This finding has seme 
important irrplications. Specifically, it allcws for the possibility of moving 
towards an idealized referent structure that transcends the various 
idiosyncrasies of individual experts. We must sitphasize that the idea of an 
idealized referent structure does not in any way constrain individual 
creativity. The fact is, although ej^jert structures are more similar to one 
another than novice structures, each expert's organization has unique 
characteristics. 

Precisely how the catparison between student and expert r^resentation is 
carried out depends, in part, on the type of representation being ccnparedc 
To begin, we can take the relatedness ratings matrix itself as a raw 
representation of an individual's knowledge. The most obvious and direct way 
to assess the similarity between two proximty matrices is sinply to ccstpute 
the correlation between the two sets of ratings. We have found this measure 
of similarity to be a good predictor of classrocgti exam performance with 
correlations between similarity and total points on exams ranging frcan .45 to 
.83 across different semesters and different courses. 

Althouc^ the correlations on raw ratings may perform quite well as a 
predictor, it does not fare well on the other two criteria by vAiich we 
evaluate representations. First, a matrix of ratings is not easily 
cOTprehended, and second, it is not motivated frcm any explicit ttiaoretical 
perspective. If we adopt a structural approach, we want to look at 
representations and methods of comparing representations that emj^iasize 
structural properties. Recall that our definition of structure focused on the 
interrelationships among concepts, which we believe is best captured by 
network repre,sentations. We also hypothesized that the meaning of an 
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irxiividual concept is defined in terms of the concepts that are closely 
related to it. This has scene lirportant inplications for how we evaluate the 
similarity between two networks. 

When evaluating Pathfinder derived network rqpresentations, it is quite 
possible to quantify the similarity between a student and expert network graph 
by siirply correlating the graph distances between respective pairs of 
concerts. However, this correlational measure of sindlarity does not capture 
the iftore global pr<^)erties of our definition of structure^ (viz. , a concept 
vtiich is defined by its nei(^ibors) . To overcome this limitation, we 
develqped (Goldsmith & Daver^ort, 1990) a set theoretic iteasure called C that 
reflects the similarity in neic^iborhoods between two oonc^>ts. For exairple, 
assume that concept A in a student's network is directly linked to concepts 
B, C, and D, whereas concept A in the expert's network is linked to concepts B 
and C. The measure C is the ratio of the size of the intersection (B and C) 
over the size of the union (B, C, and D) or .67. We do this for each conc^t 
and then siitply average the ratios over all the concepts. We have found the 
similarity measure C of Pathfinder networks to be a better predictor of exam 
scores than correlational measures on raw proximity data, network distances, 
or Euclidean distances derived from MDS scaling (Goldsmith, Johnson, & Acton, 
1991) . 

The point is not that using C on Pathfinder networks was necessarily a 
better predictor ^ but that our methods of assessment are consistent with our 
view of domain knowledge. It is quite possible that other itveasures and other 
dornains may yield different outcomes. Althot^ we expect that methods 
ert?*xasizing structural properties of knowledge will generally do a better job 
of assessing danain knowledge, the iitportant point is for researchers and 
practioners to adopt a coherent and theoretically principled approach to 
assessment. 

Referent Free Assessment . Most methods for evalixating donain kncwledcje 
involve an external criterion or referent. For exanple, in conventional 
testing there is the externally defined "correct answer" against vMch 
performance is evaluated. In contrast, we might look for intrinsic prcperties 
of behavior that are indicative of e^^Dertise. Once again, the specific 
intrinsic prqperties we look for siiould be consistent with our theoretical 
conceptions of domain knowledge. 

In our structural approach to knowledge assessment we have assumed that a 
concept's meaning is contained in its relationships to other conc^ts (i.e., 
its nei(^Ttoors) within the donain. Therefore, if concepts A and B are 
nei(^ibors, and conc^ts B and C are nei^ibors, there is an increased 
likelihood that conc^ts A and C are also nei(^ibors. As an individual beccxnes 
more knowledgeable we would expect her jxAigments of relatedness to beccxne more 
constrained by these nei^^rhood factors. How might one go about quantifying 
this type of constraint? Our approach is to first, use the C measure 
described above to carpute a derived distance between all pairs of conc^ts on 
the basis of nei(^nborhood similarity. Next, we ccxtpate the correlation 
between the raw ratings and the derived ratings for all pairs of concepts. We 
call this measure coherence . We have found coherence to be a reliable 
predi.ctor of student's classroom knowledge. In addition, cciierence increases 
across levels of expertise ranging from naive student to knowledgeable 
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lindergractuate to graduate student to professor (Acton, 1990) . 

Another type of referent free property of relatedness ratings is the 
consistency with which repeated pairs of conc^>ts are rated. In our rating 
task we usually rqpeat aEproxiraately 10% of the pairs, and then ocwpute the 
correlation between repeated ratings for each individual- Vile find that this 
index of reliability is significantly correlated with exam performance. Not 
surprisingly, it is easier to be consistent vAien you are knowledgeable of the 
concepts you are rating. 

TO summarize, we have proposed two methods of evaluation, referent based 
and referent free. In the case of referent based evaluation we noted the 
advantages of using expert referent representations based on the averaged 
ratings of several experts and altermtive methods of quantifying the 
similarity between two representations. In our discussion of referent free 
methods we introduced the measure of cdierenoe, vAiich reflects internal 
ccHTsistency of the ratings. It was noted that reliability may also be used as 
a referent free evaluation. The ideal "good" student is realized when all 
three measures (C, cdtierence, and reliability) are hi^. 

Implications for Curriculum Desicm and Instruction ^ 
Ihe value of assessment is contained in hew it is used. If it goes no 
further than infonxiing a student that she is in the bottm quartile of the 
class it is of little constructive value. Therefore, it is ajprcpriate to 
consider sane of the important iirplications of the structural approach for the 
design of curriculum and methods of instruction. 

Because the structural approach that we have prcposed involves a 
coitparison between student and expert network representations, it permits the 
identification of organizational differences at any level of detail. We can go 
fran looking for the presence or absence of specific links between concepts, 

to looking at more global organizational prcperties of the two networks. Ihis 
offers the possibility of providing students with extremely ccarprehensive 
feedback, however, it raises the question of how the feedback is to be used. 
More to the point, vAiat are the instructional iirplications for differences 
between student and expert networks? 

On the one hand, it is relevant to know that a inajority of students in 
your class do not see the relationship among a certain cluster of conc^5ts on 
vtiich you have just corrpleted lecturing. Clearly, it is important to have 
identified this subset of students, but given this infontation^ vtot do you do 
about the ajparent deficit in their knov^ledge? It is unlikely that the deficit 
can be corrected by sinply informing the students that concepts A, B, C, and D 
are all closely related.. Presumably they need more information on hew these 
concepts are interrelated, and when that inf oinnation is provided in an 
af^rc^riate manner we will see the changes in their network representations. 
Some support for this is provided in a study 'by Brown and Stanners (1983) . 
They showed that an MDS representation of . a student's organization of concepts 
in an introductory psychology class could be modified by focused training on a 
small subset of concepts. The training 'involved having students- make the 
rating judgments, then publicly defend their rating to the class and the 
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instructor. In some instances the instructor would then spend several minutes 
discussing the relationship between specific pairs of conc^ts. 

Another potential advantage of adc^ting a cognitive structural ^roach 
to assessment is that the students can be given an objective goal that has 
face validity and is theoretically grounded. Moreover, the referent structure 
itself, r^resented as a graphic network of interconnected concsqpts, can 
serve as a type of organizaticmal schema for readings and lectures. IfiilUce 
the conventional outline that forces a linear organizatioi, a network 
structure can es^licitly r^resent all the iirportant relationships that need 
to be grasped. With cotpputer software environments such as hypertext it 
would be possible to inplement the errpirically derived structure of experts 
within a dcxnain (Jonassen, 1988) . Ihis would allow for intelligent nonlinear 
browsing throu^ the dcanain fcy novices. 

General Conclusion and Suitimarv ' 
Our primary motivation in writing the paper was to facilitate 
communication between traditional test theory and cognitive theory .f-ihe 
central theme addressed the relation between how knowledge is r^resented and 
how it is assessed. If our r^resentation of Idnowledge is organized or 
structured then our assessment of knowledge must capture this structure and 
our instruction r.TUst reflect the structure. We then outlined hew a structural 
approach to assessment could be iirplemented and summarized sca^e of the 
encouraging firdings in the area. 

In closing, we quickly summarize some of the advantages of the structural 
approach to assessment. First, a nvost basic requirement of any assessment 
technique is that it can be called to individuals, as can be done with the 
structural ajproach. Second, the administration and scoring are ocxrpletely 
objective and efficient. Once the concepts or pairs have been selected the 
entire process can be easily automated on coirputers. In regard to ease of 
administration it should also be noted that the program that presents the 
pairs always randomizes the order of presentation for each subject, thus 
minimizing order effects and the risk of cheating vAien administered in groi?)s. 
Also, it is a sitrple matter to create multiple versions of the rating task by 
changing a proportion of the concepts that are paired. Ihis, of course, allows 
rqpeated administrations of the task over the duration of a course, vftaich 
would provide a picture of structural change as learning progresses. Ttdxd, 
although the knowledge that directs our judgments of relatedriess is saiietimes 
entirely e^^licit, it ajpears, on the basis of students' introspections, that 
the judgments are often intuitively based and dependent on ifrplicit knowledge. 
In this regard the af^roach may nicely carplement some conventional exams 
(e.g., essay) that d^)end more on explicit knowledge. Fourth, the results not 
only indicate how midh a student knows (e.g. , relative similarity to an 
expert referent structure) , but also vAiat specific relationships are 
misunderstood, and \*iether the individual is internally consistent 
(i.e. , coherent) in her judgments of relatedness. Fifth, and most important 
in our opinion, the entire process, involving both training and assessment, 
is grounded in a common theoretical framework. This should foster greater 
communication and compatibility between the historically distant areas of 
psychometric assessment and cognitive theories of representation. Both should 
benefit frcxn this canmon orientation. 
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